Hands-on Exercise 3A: 1st Order Spatial Point Patterns Analysis Methods

Published

January 17, 2024

Modified

January 24, 2024

1.0 Introduction

1.1 Getting Started

In this hands-on exercise, we will be using the following packages:

  • maptools for manipulating geographic data

    (Note: We will use maptools to convert Spatial objects into ppp format of spatstat)

  • raster for reading, writing, manipulating, analyzing and modelling of gridded spatial data

    (Note: We will use raster to convert image output generated by spatstat into raster format)

  • sf for handling geospatial data

  • spatstat for point pattern analysis

    (Note: We will use spatstat to perform 1st order spatial point patterns analysis and derive kernel density estimation (KDE) layer)

  • tmap for creating thematic maps such as choropleth and proportional symbol maps,

install.packages("maptools", repos = "https://packagemanager.posit.co/cran/2023-10-13")
package 'maptools' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\user\AppData\Local\Temp\Rtmpi4HJ43\downloaded_packages
pacman::p_load(maptools, raster, sf, spatstat, tmap)

2.0 Data Acquisition

We will be using 2 datasets in this exercise:

2.1 Extracting Geospatial Data Sets

Following a structure similar to Hands-on Exercise 01, start by creating a new folder labeled Hands-on_Ex03. Within this folder, create a sub-folder named data. Inside the data sub-folder, create one additional sub-folders and rename them geospatial.

Unzip MasterPlan2014SubzoneBoundaryWebSHP.zip and place all the unzipped files, PreSchoolsLocation.geojson into the geospatial sub-folder.

3.0 Geospatial Data Handling

3.1 Importing Geospatial Data

In the previous exercises, we have learnt to import geospatial data into RStudio by using st_read() of sf package. Let’s try it now!

childcare_sf <- st_read("data/geospatial/PreSchoolsLocation.geojson")
Reading layer `PreSchoolsLocation' from data source 
  `C:\kt526\IS415-GAA\Hands-on_Ex\Hands-on_Ex03\data\geospatial\PreSchoolsLocation.geojson' 
  using driver `GeoJSON'
Simple feature collection with 2290 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6878 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
mpsz_sf <- st_read(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\kt526\IS415-GAA\Hands-on_Ex\Hands-on_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
coastaloutline_sf <- st_union(mpsz_sf)

3.2 Checking the Contents of A Simple Feature Data Frame

3.2.1 Using st_geometry() to check for inappropriate coordinate systems

st_geometry(childcare_sf)
Geometry set for 2290 features 
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6878 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
First 5 geometries:
st_geometry(mpsz_sf)
Geometry set for 323 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First 5 geometries:
st_geometry(coastaloutline_sf)
Geometry set for 1 feature 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

You might have observed variations in the coordinate systems among the data frames. Recall in Hands-on Exercise 01, it is a common practice to transform original data from geographical coordinate system to projected coordinate system.

  • childcare_sf uses WGS 84 (geographic coordinate system)

  • mpsz_sf and coastal_outline_sf uses uses SVY21 (projected coordinate system)

Let’s perform projection transformation on childcare_sf and sgsz_sf using st_transform() of sf package.

childcare3414 <- st_transform(childcare_sf, 
                              crs = 3414)

Now, we will display the contents of childcare3414.

st_geometry(childcare3414)
Geometry set for 2290 features 
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 11810.03 ymin: 25596.33 xmax: 45404.24 ymax: 49300.88
z_range:       zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

Notice that childcare3414 is now in SVY21 projected coordinate system.

3.2.2 Using st_crs() to check for missing/inaccurate coordinate systems

DIY: Using the appropriate sf function you learned in Hands-on Exercise 2, retrieve the referencing system information of these geospatial data.

To retrieve coordinate reference system from sf object, we need to use st_crs() from sf package.

st_crs(childcare3414)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(mpsz_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(coastaloutline_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

Although mpsz_sf and coastaloutline_sf are projected in SVY21, the output near the end indicates that EPSG is 9001. This is a wrong EPSG code. The correct EPSG code for SVY21 should be 3414. Let’s change it using st_transform() of sf package.

DIY: Using the method you learned in Lesson 2, assign the correct crs to mpsz_sf and coastaloutline_sf.

mpsz3414 <- st_transform(mpsz_sf, 3414)
st_crs(mpsz3414)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
coastaloutline3414 <- st_transform(coastaloutline_sf, 3414)
st_crs(coastaloutline3414)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Notice that the EPSG code are in 3414 now.

4.0 Geospatial Visualization

After checking the coordinate reference system of each geospatial data frame, it is also useful for us to plot a map to show their spatial patterns.

DIY: Using the mapping methods you learned in Hands-on Exercise 3, prepare a map as shown below.

tm_shape(coastaloutline3414) +
  tm_polygons() + 
tm_shape(mpsz3414) +
  tm_polygons() + 
tm_shape(childcare3414)+
  tm_dots()

Notice that all the geospatial layers are within the same map extend. This shows that their referencing system and coordinate values are referred to similar spatial context. This is very important in any geospatial analysis.

Alternatively, we can also prepare a pin map using tmap_mode() with the view option for interactive mode.

tmap_mode("view")
tm_shape(childcare3414)+
  tm_dots()

Reminder: Always remember to switch back to plot mode after the interactive map. This is because, each interactive mode will consume a connection. You should also avoid displaying ecessive numbers of interactive maps (i.e. not more than 10) in one RMarkdown document when publish on Netlify.

tmap_mode("plot")

5.0 Geospatial Data wrangling

Converting simple feature data frames to sp’s Spatial* classes in geospatial analysis is necessary because many geospatial analysis packages in R, such as “spatial” and “raster,” require input geospatial data in the form of sp’s Spatial* classes.

5.1 Converting sf data frames to sp’s Spatial* class

Let’s convert sf data frames into sp’s Spatial* class using as_Spatial() of sf package.

childcare <- as_Spatial(childcare3414)
mpsz <- as_Spatial(mpsz3414)
coastaloutline <- as_Spatial(coastaloutline3414)
childcare
class       : SpatialPointsDataFrame 
features    : 2290 
extent      : 11810.03, 45404.24, 25596.33, 49300.88  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 2
names       :    Name,                                                                                                                                                                                                                                                                                                                                                                       Description 
min values  :   kml_1, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>CENTRE_NAME</th> <td>3-IN-1 FAMILY CENTRE</td> </tr><tr bgcolor=""> <th>CENTRE_CODE</th> <td>ST0027</td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>DF7EC9C2478FA5A5</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20211201093631</td> </tr></table></center> 
max values  : kml_999,   <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>CENTRE_NAME</th> <td>Zulfa Kindergarten</td> </tr><tr bgcolor=""> <th>CENTRE_CODE</th> <td>PT9603</td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>527C1231DDD0FA64</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20211201093632</td> </tr></table></center> 
mpsz
class       : SpatialPolygonsDataFrame 
features    : 323 
extent      : 2667.538, 56396.44, 15748.72, 50256.33  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 15
names       : OBJECTID, SUBZONE_NO, SUBZONE_N, SUBZONE_C, CA_IND, PLN_AREA_N, PLN_AREA_C,       REGION_N, REGION_C,          INC_CRC, FMEL_UPD_D,     X_ADDR,     Y_ADDR,    SHAPE_Leng,    SHAPE_Area 
min values  :        1,          1, ADMIRALTY,    AMSZ01,      N, ANG MO KIO,         AM, CENTRAL REGION,       CR, 00F5E30B5C9B7AD8,      16409,  5092.8949,  19579.069, 871.554887798, 39437.9352703 
max values  :      323,         17,    YUNNAN,    YSSZ09,      Y,     YISHUN,         YS,    WEST REGION,       WR, FFCCF172717C2EAF,      16409, 50424.7923, 49552.7904, 68083.9364708,  69748298.792 
coastaloutline
class       : SpatialPolygons 
features    : 1 
extent      : 2667.538, 56396.44, 15748.72, 50256.33  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

Notice that the geospatial data have been converted into their respective sp’s Spatial* classes now.

5.2 Converting the Spatial* class into generic sp format

spatstat requires the analytical data in ppp object form. There is no direct way to convert a Spatial* classes into ppp object. We need to convert the Spatial classes* into Spatial object first.

childcare_sp <- as(childcare, "SpatialPoints")
mpsz_sp <- as(mpsz, "SpatialPolygons")
coastaloutline_sp <- as(coastaloutline, "SpatialPolygons")
childcare_sp
class       : SpatialPoints 
features    : 2290 
extent      : 11810.03, 45404.24, 25596.33, 49300.88  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
mpsz_sp
class       : SpatialPolygons 
features    : 323 
extent      : 2667.538, 56396.44, 15748.72, 50256.33  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
coastaloutline_sp
class       : SpatialPolygons 
features    : 1 
extent      : 2667.538, 56396.44, 15748.72, 50256.33  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

5.3 Converting the generic sp format into spatstat’s ppp format

Now, we will use as.ppp() function of spatstat to convert the spatial data into spatstat’s ppp object format.

childcare_ppp <- as(childcare_sp, "ppp")

Now, let us plot childcare_ppp and examine the different.

plot(childcare_ppp)

We can take a quick look at the summary statistics of the newly created ppp object by using summary().

summary(childcare_ppp)
Planar point pattern:  2290 points
Average intensity 2.875673e-06 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [11810.03, 45404.24] x [25596.33, 49300.88] units
                    (33590 x 23700 units)
Window area = 796335000 square units

Notice the warning message about duplicates. In spatial point patterns analysis, an issue of significant is the presence of duplicates. The statistical methodology used for spatial point patterns processes is based largely on the assumption that process are simple, that is, that the points cannot be coincident.

5.4 Handling Duplicated Points

To check for duplication in a ppp object, we can do the following:

any(duplicated(childcare_ppp))
[1] TRUE

To count the number of co-incidence point, we will use multiplicity().

multiplicity(childcare_ppp)

To know the total number of locations that have more than one point event, we can use sum():

sum(multiplicity(childcare_ppp))
[1] 3676

The output shows that there are 2451 duplicated point events.

To view the locations of these duplicate point events, we will plot the childcare data.

tmap_mode("view")
tm_shape(childcare) +
  tm_dots(alpha=0.4, 
          size=0.05)
tmap_mode("plot")

There are 3 ways to handle duplicated points:

  1. Delete the duplicates
  2. Use jittering
  3. Make each point “unique” and then attach the duplicates of the points to the patterns as marks

In this hands-on exercise, we will implement the jittering approach.

childcare_ppp_jit <- rjitter(childcare_ppp,
                             retry=TRUE,
                             nsim=1,
                             drop=TRUE)

DIY: Using the method you learned in previous section, check if any duplicated point in this geospatial data.

To check for any duplicated point, recall any(duplicated()).

any(duplicated(childcare_ppp_jit))
[1] FALSE

5.5 Creating owin object

When analysing spatial point patterns, it is a good practice to confine the analysis with a geographical area like Singapore boundary. In spatstat, an object called owin is specially designed to represent this polygonal region.

To convert sg SpatialPolygon object into owin object of spatstat, run the following:

sg_owin <- as(coastaloutline_sp, "owin")

The ouput object can be displayed by using plot() and summary():

plot(sg_owin)

summary(sg_owin)
Window: polygonal boundary
80 separate polygons (35 holes)
                  vertices         area relative.area
polygon 1                4  9.47108e+01      1.21e-07
polygon 2               37  1.29481e+04      1.66e-05
polygon 3               30  4.28933e+03      5.49e-06
polygon 4              145  9.61782e+05      1.23e-03
polygon 5              227  1.10308e+06      1.41e-03
polygon 6               19  3.09221e+04      3.95e-05
polygon 7               10  6.60195e+03      8.44e-06
polygon 8              234  2.08755e+06      2.67e-03
polygon 9               22  6.74651e+03      8.63e-06
polygon 10              71  5.63061e+03      7.20e-06
polygon 11              10  1.99717e+02      2.55e-07
polygon 12           14663  6.97996e+08      8.93e-01
polygon 13 (hole)        3 -2.05920e-03     -2.63e-12
polygon 14 (hole)        3 -2.89050e-05     -3.70e-14
polygon 15 (hole)        3 -2.83151e-01     -3.62e-10
polygon 16 (hole)        3 -3.99521e-02     -5.11e-11
polygon 17 (hole)        3 -4.95057e-02     -6.33e-11
polygon 18 (hole)        3 -3.65501e-03     -4.67e-12
polygon 19 (hole)        4 -2.05611e-02     -2.63e-11
polygon 20 (hole)        3 -1.68316e-04     -2.15e-13
polygon 21 (hole)       26 -1.25665e+03     -1.61e-06
polygon 22 (hole)        3 -2.18000e-06     -2.79e-15
polygon 23 (hole)        3 -6.62377e-01     -8.47e-10
polygon 24 (hole)        3 -2.09065e-03     -2.67e-12
polygon 25 (hole)       36 -7.79904e+03     -9.97e-06
polygon 26 (hole)        3 -8.83647e-03     -1.13e-11
polygon 27 (hole)        3 -2.21090e+00     -2.83e-09
polygon 28 (hole)       40 -6.00381e+03     -7.68e-06
polygon 29 (hole)        7 -1.40545e-01     -1.80e-10
polygon 30 (hole)       20 -4.39069e+00     -5.62e-09
polygon 31 (hole)       28 -1.99862e+01     -2.56e-08
polygon 32 (hole)       48 -1.38338e+02     -1.77e-07
polygon 33 (hole)      351 -1.21433e+03     -1.55e-06
polygon 34 (hole)       12 -8.36709e+01     -1.07e-07
polygon 35 (hole)      317 -5.11280e+04     -6.54e-05
polygon 36 (hole)       36 -4.01660e+04     -5.14e-05
polygon 37              30  2.80002e+04      3.58e-05
polygon 38              27  1.50315e+04      1.92e-05
polygon 39              15  4.03300e+04      5.16e-05
polygon 40            1045  4.44510e+06      5.68e-03
polygon 41 (hole)       13 -3.91907e+02     -5.01e-07
polygon 42              47  3.82087e+04      4.89e-05
polygon 43              65  8.42861e+04      1.08e-04
polygon 44             478  2.06120e+06      2.64e-03
polygon 45             266  1.50631e+06      1.93e-03
polygon 46             234  4.72886e+05      6.05e-04
polygon 47              14  5.86546e+03      7.50e-06
polygon 48              83  5.28920e+03      6.76e-06
polygon 49              75  1.73526e+04      2.22e-05
polygon 50             148  3.10395e+03      3.97e-06
polygon 51             142  3.22293e+03      4.12e-06
polygon 52              45  2.51218e+03      3.21e-06
polygon 53              40  1.38607e+04      1.77e-05
polygon 54              10  4.90942e+02      6.28e-07
polygon 55              95  5.96187e+04      7.62e-05
polygon 56 (hole)        4 -1.86410e-02     -2.38e-11
polygon 57              64  3.43149e+04      4.39e-05
polygon 58 (hole)        3 -1.98390e-03     -2.54e-12
polygon 59 (hole)        3 -5.55856e-03     -7.11e-12
polygon 60 (hole)        3 -5.12482e-03     -6.55e-12
polygon 61 (hole)        3 -1.96410e-03     -2.51e-12
polygon 62 (hole)        4 -1.13774e-02     -1.46e-11
polygon 63             155  2.67502e+05      3.42e-04
polygon 64             106  3.04104e+03      3.89e-06
polygon 65            1027  1.27782e+06      1.63e-03
polygon 66 (hole)        3 -3.23310e-04     -4.13e-13
polygon 67 (hole)        3 -1.16959e-03     -1.50e-12
polygon 68 (hole)        3 -1.46474e-03     -1.87e-12
polygon 69             211  4.70521e+05      6.02e-04
polygon 70               4  2.69313e+02      3.44e-07
polygon 71             132  9.53357e+04      1.22e-04
polygon 72               6  4.50259e+02      5.76e-07
polygon 73             285  1.61128e+06      2.06e-03
polygon 74              91  1.49663e+04      1.91e-05
polygon 75              71  8.18750e+03      1.05e-05
polygon 76             668  5.40368e+07      6.91e-02
polygon 77              77  3.29939e+05      4.22e-04
polygon 78             711  1.28815e+07      1.65e-02
polygon 79 (hole)        3 -3.41405e-01     -4.37e-10
polygon 80              44  2.26577e+03      2.90e-06
enclosing rectangle: [2667.54, 56396.44] x [15748.72, 50256.33] units
                     (53730 x 34510 units)
Window area = 781945000 square units
Fraction of frame area: 0.422

5.6 Combining point events object and owin object

In this last step of geospatial data wrangling, we will extract childcare events that are located within Singapore.

childcareSG_ppp <- childcare_ppp[sg_owin]

The output object combined both the point and polygon feature in one ppp object class.

summary(childcareSG_ppp)

DIY: Using the method you learned in previous exercise, plot the newly derived childcareSG_ppp as shown below.

plot(childcareSG_ppp)

6.0 1st Order Spatial Point Pattern Analysis (SPPA)

In this section, we will perform 1st order SPPA by using spatstat package. Our main focus will be:

  • Deriving kernel density estimation (KDE) layer for visualization and exploring the intensity of point processes,

  • Performing Confirmatory Spatial Point Patterns Analysis by using Nearest Neighbour statistics

6.1 Kernel Density Estimation (KDE)

What is Kernel Density Estimation? Kernel Density Estimation, KDE, is a non-parametric technique that estimates the probability density function of a continuous variable. In simple terms, it smooths out your data by placing a kernel (usually a Gaussian kernel is used as default) at each data point and then summing up these kernels to create a continuous curve. This curve offers a more refined view of how data points are distributed across the variable’s range

Read more about KDE here.

How does KDE actually works?

Fig 1. Equation for Gaussian kernel

Read more about how KDE works here.

6.1.1 Computing KDE using Automatic Bandwidth Selection method

kde_childcareSG_bw <- density(childcareSG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                              kernel="gaussian") 
plot(kde_childcareSG_bw)

Notice how the density values of the legend range from 0 to 0.000035. This is way too small to comprehend ! 👀

Why is this the case? This is because the default unit measurement for SVY 21 is in meter.

As a result, the density values computer is in ” umber of points per square meter”.

Note: We can retrieve the bandwidth used to compute the KDE layer using bw.diggle().

bw <- bw.diggle(childcareSG_ppp)
bw
   sigma 
295.4419 

6.1.2 Rescaling KDE values

Noting that the default unit measurement for SVY 21 is in meter, is it possible to convert the unit measurement into kilometer? To do so, we can use the rescale().

childcareSG_ppp.km <- rescale(childcareSG_ppp, 1000, "km")

Let’s run density() using childcareSG_ppp.km (a.k.a. rescaled dataset) and plot out the KDE map.

kde_childcareSG_bw <- density(childcareSG_ppp.km,
                              sigma=bw.diggle,
                              edge=TRUE,
                              kernel="gaussian")
plot(kde_childcareSG_bw)

Notice that output image looks identical to the earlier version, the only changes in the data values (refer to the legend).

6.1.3 Working with different automatic bandwidth methods

Besides bw.diggle(), there are three other spatstat functions that can be used to determine the bandwidth, they are:

  • bw.CvL(),

  • bw.scott(), and

  • bw.ppl()

Let’s take a look at the bandwidth returned by these automatic bandwidth calculation methods.

bw.diggle(childcareSG_ppp.km)
    sigma 
0.2954419 
bw.CvL(childcareSG_ppp.km)
  sigma 
4.54311 
bw.scott(childcareSG_ppp.km)
 sigma.x  sigma.y 
2.111666 1.347496 
bw.ppl(childcareSG_ppp.km)
    sigma 
0.2109003 

There is ongoing debate regarding the optimal methods for pattern detection. However, according to a study, it is recommended to employ bw.ppl() when dealing with patterns predominantly characterized by tight clusters. On the other hand, bw.diggle() is suggested for detecting a single tight cluster within a background of random noise. In comparing bw.diggle() and bw.ppl():

kde_childcareSG.ppl <- density(childcareSG_ppp.km, 
                               sigma=bw.ppl, 
                               edge=TRUE,
                               kernel="gaussian")
par(mfrow=c(1,2), mar = c(2, 2, 2, 2))
plot(kde_childcareSG_bw, main = "bw.diggle")
plot(kde_childcareSG.ppl, main = "bw.ppl")

6.1.4 Working with different kernel methods

The default kernel method employed in density.ppp() is Gaussian. However, there are three alternative options available:

  • Epanechnikov,

  • Quartic, and

  • Dics

par(mfrow=c(2,2), mar = c(2, 2, 2, 2))
plot(density(childcareSG_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="gaussian"), 
     main="Gaussian")
plot(density(childcareSG_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="epanechnikov"), 
     main="Epanechnikov")
plot(density(childcareSG_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="quartic"), 
     main="Quartic")
plot(density(childcareSG_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="disc"),
     main="Disc")

6.1.5 Computing KDE by using Fixed Bandwidth

Let’s compute a KDE layer by defining a bandwidth of 600 meters. We’ll use a sigma value of 0.6 in this case, as the unit of measurement of our childcareSG_ppp.km object is in kilometers, hence the 600m is 0.6km.

kde_childcareSG_600 <- density(childcareSG_ppp.km, sigma=0.6, edge=TRUE, kernel="gaussian")
plot(kde_childcareSG_600)

A downside of fixed bandwidth is that this method is very sensitive to highly skewed distributions.

6.1.6 Computing KDE by using Adaptive Bandwidth

One way to overcome this problem is by using adaptive bandwidth with density.adaptive() of spatstat package.

kde_childcareSG_adaptive <- adaptive.density(childcareSG_ppp.km, method="kernel")
plot(kde_childcareSG_adaptive)

We can now compare the results of fixed and adaptive kernel density estimation.

par(mfrow=c(1,2), mar = c(2, 2, 2, 2))
plot(kde_childcareSG_bw, main = "Fixed bandwidth")
plot(kde_childcareSG_adaptive, main = "Adaptive bandwidth")

6.1.7 Converting KDE output into grid object and grid object into raster

In order for a KDE output to be suitable for mapping purposes, we can convert it into a grid object.

Note: The results will be the same.

gridded_kde_childcareSG_bw <- as.SpatialGridDataFrame.im(kde_childcareSG_bw)
spplot(gridded_kde_childcareSG_bw)

Next, we will convert the gridded kernal density objects into RasterLayer object by using raster() of raster package.

kde_childcareSG_bw_raster <- raster(gridded_kde_childcareSG_bw)

Let’s take a look at the properties of kde_childcareSG_bw_raster RasterLayer.

kde_childcareSG_bw_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.419757, 0.2695907  (x, y)
extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : -1.227421e-14, 43.27275  (min, max)

Notice that the crs property is NA.

6.1.8 Assigning projection systems

To assign projection systems:

projection(kde_childcareSG_bw_raster) <- CRS("+init=EPSG:3414")
kde_childcareSG_bw_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.419757, 0.2695907  (x, y)
extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
crs        : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs 
source     : memory
names      : v 
values     : -1.227421e-14, 43.27275  (min, max)

Lastly, let’s visualize raster in cartographic quality map using tmap package.

tm_shape(kde_childcareSG_bw_raster) + 
  tm_raster("v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)

Notice that the raster values are encoded explicitly onto the raster pixel using the values in “v”” field.

6.1.9 Comparing Spatial Point Patterns using KDE

Let’s compare the KDE of childcare at Punggol, Tampines, Chua Chu Kang and Jurong West planning areas.

6.1.9.1 Extracting study area

pg = mpsz[mpsz@data$PLN_AREA_N == "PUNGGOL",]
tm = mpsz[mpsz@data$PLN_AREA_N == "TAMPINES",]
ck = mpsz[mpsz@data$PLN_AREA_N == "CHOA CHU KANG",]
jw = mpsz[mpsz@data$PLN_AREA_N == "JURONG WEST",]

6.1.9.2 Plotting target planning areas

par(mfrow=c(2,2), mar = c(2, 2, 2, 2))
plot(pg, main = "PUNGGOL")
plot(tm, main = "TAMPINES")
plot(ck, main = "CHOA CHU KANG")
plot(jw, main = "JURONG WEST")

6.1.9.3 Converting the spatial point data frame into generic sp format

pg_sp = as(pg, "SpatialPolygons")
tm_sp = as(tm, "SpatialPolygons")
ck_sp = as(ck, "SpatialPolygons")
jw_sp = as(jw, "SpatialPolygons")

6.1.9.4 Creating owin object

pg_owin = as(pg_sp, "owin")
tm_owin = as(tm_sp, "owin")
ck_owin = as(ck_sp, "owin")
jw_owin = as(jw_sp, "owin")

6.1.9.5 Combining childcare points and the study area

childcare_pg_ppp = childcare_ppp_jit[pg_owin]
childcare_tm_ppp = childcare_ppp_jit[tm_owin]
childcare_ck_ppp = childcare_ppp_jit[ck_owin]
childcare_jw_ppp = childcare_ppp_jit[jw_owin]

We will use rescale() to transform the unit of measurement from metre to kilometre.

childcare_pg_ppp.km = rescale(childcare_pg_ppp, 1000, "km")
childcare_tm_ppp.km = rescale(childcare_tm_ppp, 1000, "km")
childcare_ck_ppp.km = rescale(childcare_ck_ppp, 1000, "km")
childcare_jw_ppp.km = rescale(childcare_jw_ppp, 1000, "km")

Lastly, let’s plot out the four study areas and the locations of the childcare centres.

par(mfrow=c(2,2), mar = c(2, 2, 2, 2))
plot(childcare_pg_ppp.km, main="PUNGGOL")
plot(childcare_tm_ppp.km, main="TAMPINES")
plot(childcare_ck_ppp.km, main="CHOA CHU KANG")
plot(childcare_jw_ppp.km, main="JURONG WEST")

6.1.9.6 Computing KDE of four planning areas using Automatic Bandwidth Selection method

We will now use the bw.diggle() automatic bandwidth selection to derive the bandwidths for each planning areas.

par(mfrow=c(2,2), mar = c(2, 2, 2, 2))
plot(density(childcare_pg_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="PUNGGOL")
plot(density(childcare_tm_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="TAMPINES")
plot(density(childcare_ck_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="CHOA CHU KANG")
plot(density(childcare_jw_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="JURONG WEST")

6.1.9.7 Computing KDE of four planning areas using Fixed Bandwidth Selection method

For comparison purposes, let’s also try computing the KDE layers by defining a fixed bandwidth of 250 meters.

par(mfrow=c(2,2), mar = c(2, 2, 2, 2))
plot(density(childcare_pg_ppp.km, 
             sigma=0.25, 
             edge=TRUE, 
             kernel="gaussian"),
     main="PUNGGOL")
plot(density(childcare_tm_ppp.km, 
             sigma=0.25, 
             edge=TRUE, 
             kernel="gaussian"),
     main="TAMPINES")
plot(density(childcare_ck_ppp.km, 
             sigma=0.25, 
             edge=TRUE, 
             kernel="gaussian"),
     main="CHOA CHU KANG")
plot(density(childcare_jw_ppp.km, 
             sigma=0.25, 
             edge=TRUE, 
             kernel="gaussian"),
     main="JURONG WEST")

Note: The sigma value of 0.25 in this case, as the unit of measurement of our childcare_tm_ppp.km object is in kilometers, hence the 250m is 0.25km.

6.2 Nearest Neighbor Analysis

In this section, we will be performing the Clark-Evans test of aggregation for SPPA, using the clarkevans.test() of statspat package.

To get started, let’s formulate our test hypotheses and state the confidence interval we are using:

  • H0 = The distribution of childcare services are randomly distributed.

  • H1= The distribution of childcare services are not randomly distributed.

  • The 95% confidence interval will be used.

Note:

Null Hypothesis (H0) – This can be thought of as the implied hypothesis. “Null” meaning “nothing”. This hypothesis states that there is no difference between groups or no relationship between variables. The null hypothesis is a presumption of status quo or no change.

Alternative Hypothesis (H1) – This is also known as the claim. This hypothesis should state what you expect the data to show, based on your research on the topic. This is your answer to your research question.

Read more about null and alternative hypotheses here.

6.2.1 Testing spatial point patterns using Clark-Evans Test

clarkevans.test(childcareSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)
1
childcareSG_ppp: a spatial point pattern - object of class ppp,
2
correction: character string of the type of edge correction to be applied; correction="none" denotes no edge correction is applied,
3
clipregion a window for guard area correction - object of class owin,
4
alternative: string indicating the type of alternative for the hypothesis test. Partially matched; alternative=“clustered” denotes the alternative hypothesis is that R < 1 corresponding to a clustered point pattern
5
nsim: number of Monte Carlo simulations to perform

Note: If the argument clipregion is given, then the selected edge corrections will be assumed to include correction="guard".

6.2.1.1 Performing Clark-Evans Test in Choa Chu Kang planning area

clarkevans.test(childcare_ck_ppp,
                correction="none",
                clipregion=NULL,
                alternative=c("two.sided"),
                nsim=999)

    Clark-Evans test
    No edge correction
    Z-test

data:  childcare_ck_ppp
R = 0.77501, p-value = 4.894e-05
alternative hypothesis: two-sided

6.2.1.2 Performing Clark-Evans Test in Tampines planning area

clarkevans.test(childcare_tm_ppp,
                correction="none",
                clipregion=NULL,
                alternative=c("two.sided"),
                nsim=999)

    Clark-Evans test
    No edge correction
    Z-test

data:  childcare_tm_ppp
R = 0.59762, p-value < 2.2e-16
alternative hypothesis: two-sided